在本文中,我们研究了一类二聚体优化问题,也称为简单的双重优化,在其中,我们将光滑的目标函数最小化,而不是另一个凸的约束优化问题的最佳解决方案集。已经开发了几种解决此类问题的迭代方法。 las,它们的收敛保证并不令人满意,因为它们要么渐近,要么渐近,要么是收敛速度缓慢且最佳的。为了解决这个问题,在本文中,我们介绍了Frank-Wolfe(FW)方法的概括,以解决考虑的问题。我们方法的主要思想是通过切割平面在局部近似低级问题的解决方案集,然后运行FW型更新以减少上层目标。当上层目标是凸面时,我们表明我们的方法需要$ {\ mathcal {o}}(\ max \ {1/\ epsilon_f,1/\ epsilon_g \})$迭代才能找到$ \ \ \ \ \ \ epsilon_f $ - 最佳目标目标和$ \ epsilon_g $ - 最佳目标目标。此外,当高级目标是非convex时,我们的方法需要$ {\ MATHCAL {o}}(\ max \ {1/\ epsilon_f^2,1/(\ epsilon_f \ epsilon_g})查找$(\ epsilon_f,\ epsilon_g)$ - 最佳解决方案。我们进一步证明了在“较低级别问题的老年人错误约束假设”下的更强的融合保证。据我们所知,我们的方法实现了所考虑的二聚体问题的最著名的迭代复杂性。我们还向数值实验提出了数值实验。与最先进的方法相比,展示了我们方法的出色性能。
translated by 谷歌翻译
Knowledge distillation (KD) has gained a lot of attention in the field of model compression for edge devices thanks to its effectiveness in compressing large powerful networks into smaller lower-capacity models. Online distillation, in which both the teacher and the student are learning collaboratively, has also gained much interest due to its ability to improve on the performance of the networks involved. The Kullback-Leibler (KL) divergence ensures the proper knowledge transfer between the teacher and student. However, most online KD techniques present some bottlenecks under the network capacity gap. By cooperatively and simultaneously training, the models the KL distance becomes incapable of properly minimizing the teacher's and student's distributions. Alongside accuracy, critical edge device applications are in need of well-calibrated compact networks. Confidence calibration provides a sensible way of getting trustworthy predictions. We propose BD-KD: Balancing of Divergences for online Knowledge Distillation. We show that adaptively balancing between the reverse and forward divergences shifts the focus of the training strategy to the compact student network without limiting the teacher network's learning process. We demonstrate that, by performing this balancing design at the level of the student distillation loss, we improve upon both performance accuracy and calibration of the compact student network. We conducted extensive experiments using a variety of network architectures and show improvements on multiple datasets including CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet. We illustrate the effectiveness of our approach through comprehensive comparisons and ablations with current state-of-the-art online and offline KD techniques.
translated by 谷歌翻译
A recent explosion of research focuses on developing methods and tools for building fair predictive models. However, most of this work relies on the assumption that the training and testing data are representative of the target population on which the model will be deployed. However, real-world training data often suffer from selection bias and are not representative of the target population for many reasons, including the cost and feasibility of collecting and labeling data, historical discrimination, and individual biases. In this paper, we introduce a new framework for certifying and ensuring the fairness of predictive models trained on biased data. We take inspiration from query answering over incomplete and inconsistent databases to present and formalize the problem of consistent range approximation (CRA) of answers to queries about aggregate information for the target population. We aim to leverage background knowledge about the data collection process, biased data, and limited or no auxiliary data sources to compute a range of answers for aggregate queries over the target population that are consistent with available information. We then develop methods that use CRA of such aggregate queries to build predictive models that are certifiably fair on the target population even when no external information about that population is available during training. We evaluate our methods on real data and demonstrate improvements over state of the art. Significantly, we show that enforcing fairness using our methods can lead to predictive models that are not only fair, but more accurate on the target population.
translated by 谷歌翻译
美国和全球的两个主要死亡原因是中风和心肌梗塞。两者的根本原因是由破裂或侵蚀的不稳定的动脉粥样硬化斑块释放的,这些斑块阻塞了心脏(心肌梗塞)或大脑(中风)的血管。临床研究表明,在斑块破裂或侵蚀事件中,斑块组成比病变大小更重要。为了确定斑块组成,计算了3D心血管免疫荧光图像的各种细胞类型的斑块病变。但是,手动计算这些细胞是昂贵的,耗时的,并且容易发生人为错误。手动计数的这些挑战激发了对自动化方法进行定位和计算图像中细胞的需求。这项研究的目的是开发一种自动方法,以最少的注释工作在3D免疫荧光图像中准确检测和计数细胞。在这项研究中,我们使用弱监督的学习方法使用点注释来训练悬停网络分割模型,以检测荧光图像中的核。使用点注释的优点是,与像素的注释相比,它们需要更少的精力。为了使用点注释训练悬停的网络模型,我们采用了一种普遍使用的群集标记方法,将点注释转换为精确的细胞核二进制掩模。传统上,这些方法从点注释产生了二进制面具,使该物体周围的区域未标记(通常在模型训练中被忽略)。但是,这些区域可能包含重要信息,有助于确定细胞之间的边界。因此,我们在这些区域使用了熵最小化的损失函数,以鼓励模型在未标记区域上输出更自信的预测。我们的比较研究表明,使用我们的弱训练的悬停网络模型...
translated by 谷歌翻译
Colleges and universities use predictive analytics in a variety of ways to increase student success rates. Despite the potential for predictive analytics, two major barriers exist to their adoption in higher education: (a) the lack of democratization in deployment, and (b) the potential to exacerbate inequalities. Education researchers and policymakers encounter numerous challenges in deploying predictive modeling in practice. These challenges present in different steps of modeling including data preparation, model development, and evaluation. Nevertheless, each of these steps can introduce additional bias to the system if not appropriately performed. Most large-scale and nationally representative education data sets suffer from a significant number of incomplete responses from the research participants. While many education-related studies addressed the challenges of missing data, little is known about the impact of handling missing values on the fairness of predictive outcomes in practice. In this paper, we set out to first assess the disparities in predictive modeling outcomes for college-student success, then investigate the impact of imputation techniques on the model performance and fairness using a commonly used set of metrics. We conduct a prospective evaluation to provide a less biased estimation of future performance and fairness than an evaluation of historical data. Our comprehensive analysis of a real large-scale education dataset reveals key insights on modeling disparities and how imputation techniques impact the fairness of the student-success predictive outcome under different testing scenarios. Our results indicate that imputation introduces bias if the testing set follows the historical distribution. However, if the injustice in society is addressed and consequently the upcoming batch of observations is equalized, the model would be less biased.
translated by 谷歌翻译
It is of critical importance to be aware of the historical discrimination embedded in the data and to consider a fairness measure to reduce bias throughout the predictive modeling pipeline. Given various notions of fairness defined in the literature, investigating the correlation and interaction among metrics is vital for addressing unfairness. Practitioners and data scientists should be able to comprehend each metric and examine their impact on one another given the context, use case, and regulations. Exploring the combinatorial space of different metrics for such examination is burdensome. To alleviate the burden of selecting fairness notions for consideration, we propose a framework that estimates the correlation among fairness notions. Our framework consequently identifies a set of diverse and semantically distinct metrics as representative for a given context. We propose a Monte-Carlo sampling technique for computing the correlations between fairness metrics by indirect and efficient perturbation in the model space. Using the estimated correlations, we then find a subset of representative metrics. The paper proposes a generic method that can be generalized to any arbitrary set of fairness metrics. We showcase the validity of the proposal using comprehensive experiments on real-world benchmark datasets.
translated by 谷歌翻译
本文解决了从高维离散数据中学习稀疏结构贝叶斯网络的问题。与连续的贝叶斯网络相比,由于较大的参数空间,学习离散的贝叶斯网络是一个具有挑战性的问题。尽管已经开发了许多用于学习连续贝叶斯网络的方法,但对于离散的网络,很少有人提出任何方法。在本文中,我们将学习贝叶斯网络作为优化问题,并提出一个分数函数,以确保学习结构为稀疏的定向无环图。此外,我们实现了一个块状随机坐标下降算法来优化分数函数。具体而言,我们在优化算法中使用降低方差方法来使该算法有效地用于高维数据。所提出的方法应用于众所周知的基准网络的合成数据。测量构造网络的质量,可扩展性和鲁棒性。与某些竞争方法相比,结果表明,我们的算法优于一些众所周知的建议方法。
translated by 谷歌翻译
Existing popular methods for semi-supervised learning with Graph Neural Networks (such as the Graph Convolutional Network) provably cannot learn a general class of neighborhood mixing relationships. To address this weakness, we propose a new model, MixHop, that can learn these relationships, including difference operators, by repeatedly mixing feature representations of neighbors at various distances. MixHop requires no additional memory or computational complexity, and outperforms on challenging baselines. In addition, we propose sparsity regularization that allows us to visualize how the network prioritizes neighborhood information across different graph datasets. Our analysis of the learned architectures reveals that neighborhood mixing varies per datasets. 1 We use "like", as graph edges are not axis-aligned.
translated by 谷歌翻译